Skip to main content

Data Processing

Data processing and feature extraction play a key role in machine learning pipeline.

  • Kedro is a workflow development tool that helps you build data pipelines that are robust, scalable, deployable, reproducible and versioned. [GitHub]
  • Google/jax: Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more [GitHub]
  • CuPy: NumPy-like API accelerated with CUDA [GitHub]
  • Modin: Speed up your Pandas workflows by changing a single line of code [GitHub]
  • Weld: Weld is a runtime for improving the performance of data-intensive applications. [Project Website]
  • Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines [Project Website]
    • Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, Saman Amarasinghe. (PLDI 2013)
    • Summary: Halide is a programming language designed to make it easier to write high-performance image and array processing code on modern machines.
  • a-mma/AquilaDB: Resilient, Replicated, Decentralized, Host neutral vector database to store Feature Vectors along with JSON Metadata. Do similarity search from anywhere, even from the darkest rifts of Aquila. Production ready solution for Machine Learning engineers and Data scientists. [GitHub]
  • ShannonAI/service-streamer: Boosting your Web Services of Deep Learning Applications. [GitHub]